Video Content Valuation Prediction Using A Prediction Network

ABSTRACT

In some embodiments, a method receives a plurality of inputs for a video for a plurality of times at a prediction network that includes a plurality of cells. The prediction network generates a plurality of predictions of watch behavior of the video for the plurality of inputs at the plurality of cells. The plurality of predictions predicts a performance of the video on a video delivery service for the plurality of times. Actual performance data generated from users viewing the video on the video delivery service is received before a time. A time series residual for at least a portion of the plurality of predictions is generated from the actual performance data and prior predictions. The portion of the predictions after the time using values in the time series residual is adjusted. The adjusted predictions of watch behavior are output for the video.

BACKGROUND

A video delivery service may want to predict the performance of a videoin the future, such as a weekly watch hour percentage for the next fiveyears. The weekly watch hour percentage may predict the percentage ofhours the video is watched by users over the total watch hours. Watchhour percentage may defined as y_title/y_all over a time window, wherey_title is the total watch hours for a specific video watched by allpossible viewers over a time window, and y_all is the total watch hoursfor all videos watched by all viewers on the video delivery service. Itcan be weekly, monthly, or yearly and is referred to as percentage hours(PH). The video delivery service may use another metric calledpercentage cost (PC) to measure the cost of each video. The efficiencyof a title is assessed by the ratio PH/PC. The video delivery servicehas a threshold for the efficiency. If the efficiency is lower than thethreshold, the video delivery service may have a set of policies onwhether or how to purchase a title for the video.

The prediction has multiple challenges. For example, the video deliveryservice may want to perform the prediction as soon as possible after thevideo launches, but the video delivery service wants the prediction tobe as accurate as possible. Generally, performing the prediction soonafter the video launches does not achieve high accuracy because thevideo delivery service does not have any information regarding theperformance of the video on the video delivery service (e.g., the numberof watch hours) to perform the prediction.

If predicting the video's performance is performed upon the initiallaunch of the video on the video delivery service, the video deliveryservice may use a linear or non-linear regression prediction or analysisto predict the future performance of the video. The regression analysismodels the performance using a function that assumes independencebetween the inputs to the prediction. However, the video deliveryservice may have weekly watch behavior that may exhibit a strongsequential correlation. That is, the current week's watch behavior maybe correlated to the previous week's watch behavior. However, theregression models will ignore this dependency structure and treat eachprediction independently. This may result in a prediction that may notbe accurate for the performance of the video on the video deliveryservice.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 depicts a simplified system of a method for predicting videoperformance according to some embodiments.

FIG. 2 depicts a more detailed example of a prediction network accordingto some embodiments.

FIG. 3 depicts a more detailed example of a cell according to someembodiments.

FIG. 4 depicts an example of a neuron according to some embodiments.

FIG. 5 shows an example of a prediction according to some embodiments.

FIG. 6 shows an example of the altering of the initial predictionaccording to some embodiments.

FIG. 7 depicts an example of the residuals that are calculated accordingto some embodiments.

FIG. 8 depicts a video streaming system in communication with multipleclient devices via one or more communication networks according to oneembodiment.

FIG. 9 depicts a diagrammatic view of an apparatus for viewing videocontent and advertisements.

DETAILED DESCRIPTION

Described herein are techniques for a video prediction system. In thefollowing description, for purposes of explanation, numerous examplesand specific details are set forth in order to provide a thoroughunderstanding of some embodiments. Some embodiments as defined by theclaims may include some or all of the features in these examples aloneor in combination with other features described below and may furtherinclude modifications and equivalents of the features and conceptsdescribed herein.

Some embodiments use a two-stage prediction process to predict a video'sfuture performance, such as a monthly or weekly watch percentage for thevideo. The prediction may use external or exogenous information fromdifferent sources. The prediction of the performance may use sequentialcorrelation when generating the prediction, such as the weekly watchpercentage prediction may be correlated to previous watch percentageprediction.

In stage one of the prediction process, a prediction network receivesthe current exogenous information as input, but also uses informationfrom a previous prediction (or predictions) as input. The predictiontakes into account the sequential dependence that exist to generate theprediction. Then, the prediction network outputs the prediction.

Conventionally, a prediction network that takes into account theprevious predictions as input may be configured to output a singleprediction. For example, the prediction network may be configured toperform a classification, such as to classify the sentiment of asentence, where the input is a sequence of words, and the output is thelabel of positive or negative sentiment. In these examples, multipleinputs may be classified into a single output. However, a video deliveryservice may require that the prediction network output multiplepredictions for different time periods, such as weekly predictions. Thisrequires that the prediction network be a many-to-many network, whichreceives multiple inputs and outputs multiple outputs. The predictionnetwork generates a sequence of predictions given a sequence of inputdata.

The prediction may be performed during the relative beginning ofoffering a video on the video delivery service, such as before or at thelaunch of the video. For example, the prediction can be generated beforestatistics of actual watch behavior on the video delivery service arereceived. After the video is released on the video delivery service,statistics for the actual watch behavior are received. Stage two usesthe actual behavior on the video delivery service, such as the watchbehavior (e.g., hours of time spent watching the video), to adjustfuture predictions (e.g., predictions for future time periods in whichactual watch behavior has not been received). The adjustment analyzesthe difference between the predicted performance and the actualperformance and then can adjust the future prediction based on thedifference. Given that the difference between the predicted performanceand the actual performance changes over time, stage two interpolates thedifference over time to alter future predictions at other times.

Using stage two simplifies the prediction network because the predictionnetwork can be designed to predict the performance of the video withoutusing statistics for the actual watch behavior. Also, this allows thevideo delivery service to generate the predicted performance when thevideo is launched on the video delivery service and then adjust thepredictions at a later time. Further, if the prediction network wasconfigured to re-generate the prediction after generating the initialprediction, the logic of the prediction network would have to be changedto use the actual watch behavior after the initial prediction.

System Overview

FIG. 1 depicts a simplified system 100 of a method for predicting videoperformance according to some embodiments. System 100 includes a serversystem 102 and clients 104.

A video delivery service may deliver videos to users using videodelivery system 110. For example, a video delivery system 110 deliversvideos to clients 104 that request the videos. Different users may watchvideos during a time period. This provides an actual watch behavior thatmay summarize the viewing of the video during the time period, such asthe weekly watch behavior is the percentage of hours the video iswatched by users over the total watch hours on the video deliveryservice. The actual watch behavior may be used later in stage two of theprediction as will be discussed below.

Videos may include different types of videos, such as shows that releaseepisodes, movies, and shorts. The watch behavior of a video may vary onthe video delivery service. That is, the watch behavior may not decaylinearly. For example, a video may be a show that releases episodesweekly, monthly, or seasonally. This may affect the watch behavior asusers may increase the watching of a show when new episodes are releasedor going to be released. Also, other episodes of shows may see increasedviewing when new episodes are released, or a new season starts. Thisintroduces some variability in the watch behavior that is not a constantor regressive decay.

The video delivery service may attempt to predict the performance ofvideos, which can be used by the video delivery service when providingservices to users, such as when forecasting performance of a video orevaluating licensing deals for a video. The prediction may be performedbefore or when a video is released on the video delivery service. Inthis case, the video delivery service did not have any actual watchbehavior information of users watching the video on the video deliveryservice. The video delivery service may also make a prediction at othertimes, such as after the release of the video. However, in someexamples, the video delivery service may not use any watch behavior whenmaking the prediction. Starting the prediction as soon as possible maybe needed, but the video delivery service also wants the prediction asaccurate as possible. If the video delivery service uses actual watchdata to generate the prediction with reasonable accuracy, the videodelivery service must wait until enough data is received. For this typeof prediction, the amount of data may be large, such as two years ofdata. However, the video delivery service may need the prediction beforethe video starts streaming on the video delivery service or soonthereafter.

Server system 102 includes a prediction network 106 that can perform aprediction to predict the performance of the video on the video deliveryservice. For example, prediction network 106 may output watch behavior,such as a weekly or monthly watch behavior percentage. The performancemay also be quantified by other factors, such as a number of videoviews. Prediction network 106 may receive a series of inputs and outputmultiple predictions for multiple time periods. That is, predictionnetwork 106 does not output a single prediction but outputs a series ofpredictions for the series of inputs.

Prediction network 106 may make an initial prediction using availableexogenous information associated with each video. The exogenousinformation is information that is different from the actual metricbeing predicted and may be information that is based on factors outsideof the video delivery service or based on information derived fromwithin the video delivery service. The exogenous information may beinput into prediction network 106 as exogenous variables, which may befeatures or explanatory variables that are different from the outputvariable that is being predicted. For example, the exogenous informationmay not include the watch behavior if the watch behavior is beingpredicted. Some examples of exogenous information include the video'slaunch date in weekly intervals over a period of five years. Forexample, a show may include multiple launch dates over weekly intervalsfor episodes. Also, other exogenous information includes the video dealdata, content metadata, and temporal data. The video deal data mayinclude whether the content has a contributor license agreement (CLA)license or not. The contact metadata may be metadata associated with thevideo, such as the number of days from the video's premiere, anindicator as to whether full episodes of a season are available or onlypartially available, the content provider (e.g., the television (TV)network), the genre of the video, the number of episodes released forthe video, and the episode length. The temporal data may include datathat is associated with time, such as the number of weeks since thelaunch date on the video delivery service, the number of weeks since thepremiere date of the show, and the month or the week in which theprediction is made. If the prediction is made on the launch date, thenthe value for the number of weeks since the launch date would be zero.If the prediction is for two weeks in the future, then the value for thenumber of weeks since the launch date would be two weeks. The use of theabove list of features is just an example and other exogenousinformation may be used.

Prediction network 106 may receive the exogenous information from videoinformation 114 in storage 116. For example, the video delivery servicemay collect the exogenous information and store it as video information114 for the various videos offered by the video delivery service. Someembodiments input the exogenous information into multiple inputs ofprediction network 106 to predict the performance at multiple outputs.Prediction network 106 may perform the prediction using the correlationbetween a prior prediction, such as a previous week's watch behavior, toa current prediction being made, such as a current week's watchbehavior. This is different from using a linear or non-linear regressionscenario, which assumes independence among the input variables of theexogenous information that the prediction network may take as input tomake a prediction, because prediction network 106 also receives previouspredictions as input. Prediction network 106 then generates theprediction using the two types the exogenous information and theprevious prediction for a time period. For example, if weeklypredictions are being made, the prediction for the second week may usethe exogenous information for the second week and information from theprediction from the first week. The exogenous information for the secondweek may be different from the first week, such as temporal informationfor the second week is different from temporal related information forthe first week. Further, the prediction for the third week may receiveexogenous information for the third week in addition to the predictionfrom the second week. The prediction from other previous weeks may alsobe included, such as from the first week. The exogenous information forthe third week may also be different from the exogenous information fromthe first and second weeks, such as the temporal information may change.

Prediction network 106 may have an architecture that is a many-to-manyarchitecture, which means that multiple inputs are received atprediction network 106 and prediction network 106 generates multipleoutputs. This allows prediction network 106 to generate a sequence ofpredictions given a sequence of input data, such as given a sequence ofweekly exogenous information. Then, prediction network 106 predicts asequence of outputs, such as weekly watch percentages. The operation ofprediction network 106 will be described in more detail below.

A prediction correction engine 108 receives the predictions fromprediction network 106. In some embodiments, the prediction is performedat a certain time, such as at the launch of the video on the videodelivery service, and not performed again. As actual performance of thevideo, such as the actual weekly watch percentage, is received, thevideo delivery service may be able to evaluate the accuracy of theprediction. For example, prediction correction engine 108 may receivethe actual performance of the video (e.g., weekly watch behavior) fromvideo delivery system 110 and then adjust the prediction going forwardfrom a time, such as the current time. The adjusted prediction fromprediction correction engine 108 allows the video delivery service togenerate a more accurate prediction than the prediction first generatedby prediction network 106. The adjusted prediction is generated withouthaving to rerun prediction network 106 again. The correction byprediction correction engine 108 may use less computing resources thanre-running the prediction through prediction network 106 using theupdated information. Further, the logic of prediction network 106 issimplified by not having to use the actual watch behavior in additionalto the exogenous information. Also, because the amount of actual watchbehavior needed to train prediction network 106 to output an accurateresult is large, the wait to receive that amount of data is not feasiblewhen evaluating the performance of videos on the video delivery service.As will be described in more detail below, prediction correction engine108 may use the prior performance of the video on the video deliveryservice to correct the multiple outputs from prediction network 106 inthe future.

The two-stage design for performing the prediction involves twostochastic processes, which means that the prediction performed may berandom in nature or may have a random probability distribution orpattern that is analyzed statistically but may not be predictedprecisely. The predicted values have stochasticity in nature and asingle point estimation may not be able to capture a complete spectrumof all possible values. The two stages use model parameters that arelearned from a limited quantity of data and are used to generate theprediction. It is possible that given a first set of training data and asecond set of training data, the model parameters that are generated maybe different from using different sets of data. This results inuncertainty in the prediction.

Prediction interval engine 112 provides a mechanism to estimate theresulting uncertainty in the prediction output. The mechanism can limitthe range of values that are predicted. The mechanism estimates theresulting uncertainty in the prediction. For example, the mechanismprovides a range of values that is likely to contain the true unknownvalues (e.g., the unobserved watch percentage).

The output of the prediction with the prediction interval may be used bythe video delivery service. For example, the outputs may be used toevaluate content licensing deals, as well as for other forecasting andgoal setting purposes. The prediction interval may indicate an upperbound and a lower bound for the values predicted by prediction network106. For example, if prediction network 106 outputs a watch percentageof 0.1%, the prediction interval may be 0.09% as a lower bound and 0.13%as an upper bound. These indicate values in which there is a highprobability the watch percentage falls within.

Prediction Network

FIG. 2 depicts a more detailed example of prediction network 106according to some embodiments. Prediction network 106 may includemultiple stages that each receive an input 202 and produce an output210. For example, an input X_(t−1) 202-1 may be input at a time t−1.Input 202-1 may include the exogenous information for the time t−1. Acell 204-1 receives the input X_(t−1) in addition to a prediction from aprevious cell. The first cell does not receive a previous predictionsince it is the first prediction. Cell 204-1 may receive the exogenousinformation and the prior prediction into the prediction network, whichprocesses the input and outputs a prediction. The prediction generationinside of cell 204-1 will be described in more detail below.

The output of cell 204-1 is input into two dense layers 206-1 and 208-1although a different number of dense layers may be used. The denselayers may refine the prediction. Although dense layers are described,other types of layers, such as a stacked LSTM layer, a dropout layer, abatch norm layer, etc., may be used. Dense layer 208-1 generates anoutput Y_(t−1) 210-1, which is the performance prediction for the timet−1. For example, output Y_(t−1) may be the weekly watch behavior for aweek t−1. A many-to-one configuration does not need dense layers becausethe many to one prediction network does not have an output from cellsfor previous time steps. Rather, there is only an output for the verylast time step. But a many-to-many configuration needs an output from acell 204 at each time step. The dense layers provide more modelingflexibility and extendibility to prediction network 106.

An input X_(t) 202-2 includes exogenous information at a time t. A cell204-2 receives input X_(t) in addition to the prediction output by cell204-1. Cell 204-2 can then generate a prediction for time t using inputX_(t) and prediction from cell 204-1. The output of cell 204-2 isprocessed through dense layer 206-2 and dense layer 208-2. Then, theoutput for time t is output Y_(t) 210-2, which may be the weekly watchpercentage for time t.

Similarly, a cell 204-3 receives input X_(t+1) 202-3 for a time t+1,which may include exogenous information at a time t+1, in addition tothe prediction from cell 204-2. The output of cell 204-3 is processedthrough a dense layer 206-3 and a dense layer 208-3. An output Y_(t+1)210-3 may predict the weekly watch behavior for a time t+1. The aboveprocess continues for as many time periods as are being predicted whereeach time period may be associated with a cell 204. Additionally, eachinput and output may be associated with a time period being predicted.

Prediction network 106 may be trained to output the prediction. Thetraining data set may be data limited to a number of videos over a timeperiod, such as 240 shows over a two-year time range. To overcome anover-fitting problem from lack of data and also to mitigate numericalstability issues, different techniques, such as drop-out, weightpenalization, and batch normalization, may be used.

The above network that uses the prior predictions may be different fromtypical networks that use prior predictions. For example, a typicalproblem may be performing natural language processing related tasks thattake many inputs, but only output a single output, such as theprediction is for the word that is predicted from input. Predictionnetwork 106 has been altered to generate multiple outputs becauseprediction network 106 is being used to predict the performance of avideo at multiple time periods. Instead of taking a sequence of inputdata and outputting a single result, prediction network 106 has beenaltered to generate a sequence of predictions. A layer, called a timedistributed layer, is added to make the many-to-many predictionpossible.

FIG. 3 depicts a more detailed example of a cell 204 according to someembodiments. Cells 204 include a number of neurons 302, 304, and 306.For example, cell 204-1 includes neurons 302-1, 304-1, and 306-1, cell204-2 includes neurons 302-2, 304-2, and 306-2, and cell 204-3 includesneurons 302-3, 304-3, and 306-3. Each neuron 302, 304, and 306 mayreceive an input, apply the input to a model, and then generate anoutput. For example, each neuron may receive an input value, apply afunction to the input value, and output a prediction value.

When predicting watch behavior, one aspect of prediction network 106 maybecome a problem. For example, the number of historical data points usedto predict the current value needs to be determined. The number ofhistorical data points may be referred to as the look-back length. Forexample, the look back length may be the length of the longest inputfrom all the inputs being used. For the video delivery service, the lookback length may be how far back historically in watch history, such asthe past seven weeks. When only having one output, such as in a naturallanguage prediction, the look-back length may not be a problem becausethe look-back length may be the longest sentence that is input. Forexample, if the sentence is shorter than the maximum length, the end ofthe sentence can be padded, such as by having dummy values at the end ofthe sentence. However, when predicting watch behavior, the padding of awatch percentage prediction may not be reasonable because determiningdummy values for the input data is not intuitive. This means thatprediction network 106 cannot use the length of the oldest show on thevideo delivery service as the look-back length. The length of the oldestshow may be the oldest show that was released on the video deliveryservice. To determine the look-back length, the operation of cell 204 isused.

Each cell may be composed of multiple neurons, as described above. FIG.4 depicts an example of a neuron 302 according to some embodiments.Neuron 302 receives an input of h_(t−1) and C_(t−1). The current cellvalue of C_(t) is a weighted sum of the previous cell value C_(t−1) andthe current input value X_(t). Cell 204 may use the following equations:

f _(t)=σ_(g)(W _(f) x _(t) +U _(f) h _(t−1) +b _(f))

i _(t)=σ_(g)(W _(i) x _(t) +U _(i) h _(t−1) +b _(i))

o _(t)=σ_(g)(W _(o) x _(t) +U _(o) h _(t−1) +b _(o))

c _(t) =f _(t) ∘c _(t−1) +i _(t)∘σ_(c)(W _(c) x _(t) +U _(c) h _(t−1) +b_(c))

h _(t) =o _(t)∘σ_(h)(c _(t))

The variables W, U, and b are network weights, or also called parametersthat will be estimated by training. The variable x is the input data.The variable h is the output from a neuron of a cell 204. The variable cis a cell state. The variable o_(t) is a number in the range of 0 and 1,can be considered as a probability called an output probability that isbased on the output of the neuron. The variable i_(t) is a number in therange of 0 and 1, can be considered as a probability called inputprobability that is based on the input to the neuron. The variablec′_(t) is the cell state of the current neuron. The variable f_(t) isthe remember probability, which represents a probability for usinghistorical information and may be a number in the range of 0 and 1, canbe considered as a probability called a forget probability that is basedon not using historical information.

The weights of neuron 302-2 include the remember probability of variablef_(t) and the input probability of i_(t). The remember probability ofvariable f_(t) and the input probability of variable i_(t) change overtime. The look-back length affects the remember probability and theinput probability, which should be considered when selecting a look-backlength. In some examples, each neuron may give more weight to thehistorical data rather than the current input data when predicting acurrent value. For example, the remember probability is higher than theinput probability for all neurons across all timestamps when choosing alook-back length of a value, such as 10 weeks. After analyzing thestructure of neuron 302, when a look-back length is longer, neuron 302may weigh the historical data higher than the current data.

Current value c_(t) 302-2 is the weighted sum of the previous cell valuec_(t−1) from neuron 302-1 and the current input value X_(t) 202-2combined with the output h_(t−1) from a neuron 302-1 of a previous cell204-1 (with variable f_(t), variable i_(t), and variable c′_(t)applied). The current value is then output to neuron 302-3 in the nextcell 204-3. Also, the current value is combined with the outputprobability and output as output h_(t) to neuron 302-3 in the next cell204-3. The outputs h_(t) from each neuron form the output for a cell204. By not discarding the outputs from neurons, the many to manystructure is generated. In a one to many structure, the outputs fromcells that are not the last cell are discarded and not used.

Some embodiments compute the ratio of the remember probability versusthe input probability and compare the ratios at the last neuron fordifferent look-back lengths. In some examples, a cell with a look-backlength of twelve may have consistently higher remember-to-inputprobability ratios for all neurons than that with look-back length offour. This means the cell tends to remember more from history whenlook-back length is longer. Some embodiments select a look-back lengthwithin a range of five to ten to balance the ratio of using the priorprediction and the current input.

Prediction Output

FIG. 5 shows an example graph 500 of a prediction according to someembodiments. For example, the prediction may be for a show that mayrelease multiple episodes over a number of weeks. At 502, the predictedperformance, such as the weekly watch hour percentage, is shown. Theprediction at 502 represents the initial prediction over the timeperiod. At 504, the actual watch behavior may be shown.

As the video delivery service receives actual watch behavior, the videodelivery service may determine the differences between the initialprediction and the actual watch behavior. However, some embodiments donot use the actual watch behavior to generate another prediction usingprediction network 106. Rather, a more efficient process to alter theinitial prediction is used.

FIG. 6 shows an example graph 600 of the altering of the initialprediction according to some embodiments. At 606, the revision to theinitial prediction is shown. The actual watch behavior is known from apoint before 608. From point 608, the initial prediction is revised asshown at 606.

To generate the revised prediction, prediction correction engine 108computes the difference between the actual data and the initialprediction to construct a residual, which represents the difference.Then, prediction correction engine 108 uses a forecasting technique topredict the possible future values of future residuals. The forecastedresiduals are then used to correct the initial prediction for the futuretime period. The residual r_(t) can be computed as: r_(t)=y_(t)^(observed)−y_(t) ^(pred), where t in the observed period, where y_(t)^(observed) is the actual data and y_(t) ^(pred) is the predicted data.

Given the residuals as constructed from the data to the left of point608, prediction correction engine 108 predicts the possible values ofresiduals for the time period to the right of point 608. FIG. 7 depictsan example graph 700 of the residuals that are calculated according tosome embodiments. A line 702 shows the actual residuals and a line 704shows the computed residuals that are forecasted.

To generate the future residuals, time series forecasting techniques areused. Given the generated future residuals, the following equation maybe used to compute the revised predictions:

y _(t) ^(corrected) =y _(t) ^(Pred) +r _(t) ^(forecasted),where t in theprediction period.

The above equation takes the initial prediction, y_(t) ^(Pred), and addsthe forecasted residual, r_(t) ^(forecasted), during the prediction timeperiod. This results in a generated revised prediction at 606 in FIG. 6.

The use of the time series forecasting is needed due to how the videodelivery service uses the prediction. The video delivery service ismaking decisions on efficiency of a video before having a sufficientamount of watch history to predict the weekly watch behavior. The use ofthe prediction network after receiving the actual watch history may notbe feasible because the video delivery service makes decisions regardingcost before a video is released on the service or soon after therelease.

Prediction Interval

Prediction interval engine 112 may generate the prediction interval forthe prediction. The two-stage prediction outputs a point estimation,such as a watch percentage prediction for 2019 January 1st week of 0.1%.The watch percentage of 0.1% is called point estimate because it is justa number as opposed to a variable for the watch hour percentage. Due tothe reasons stated above, the value of 0.1% may not be trusted. Supposethe true value for 2019 January 1st week is 0.12%, the predictioninterval may compute an upper bound and lower bound, for example 0.13%and 0.09% accordingly. Then the true value of 0.12% will fall in theinterval of (0.09%, 0.13%). This interval is called a predictioninterval. Given that the video delivery service is using the weeklywatch behavior to evaluate the efficiency of a video, having aprediction interval is important because the efficiency estimate must besomewhat accurate. By estimating an efficiency interval of percentagehours to percentage cost, the video delivery service can confidentlypredict efficiency that will fall within a lower bound and upper boundthat will most likely be true.

Prediction interval engine 112 generates a two-stage model in thefollowing:

y _(t) ^(o) =y _(t) ^(exogenous)+ε_(t) ^(exogenous) +y _(t)^(residual)+ε_(t) ^(residual),

where y_(t) ^(o) is the unobserved true value, and it can be decomposedto two stochastic components of y_(t) ^(exogenous)+ε_(t) ^(exogenous)and y_(t) ^(residual)+ε_(t) ^(residual) that represent the initialprediction (from prediction network 106) and residual forecastingprocess (from prediction correction engine 108), respectively. The termsof ε_(t) ^(exogenous) and ε_(t) ^(residual) are considered asdisturbance terms—unobserved random variables that add “noise” to truevalues of y_(t) ^(exogenous) and y_(t) ^(residual) in the stochasticsprocesses. Then, prediction interval engine 112 generates the two-stageprediction for y_(t) ^(o) as:

ŷ _(t) =ŷ _(t) ^(exogenous) +ŷ _(t) ^(residual),

where ŷ_(t) ^(exogenous) is the estimate from prediction network 106 andŷ_(t) ^(residual) is estimated from prediction interval engine 112.Prediction interval engine 112 derives the variance of var(y_(t)^(o)−ŷ_(t)) as:

var(y _(t) ^(o) −ŷ _(t))=var(y _(t) ^(exogenous) −ŷ _(t)^(exogenous))+var(ε_(t) ^(exogenous))+var(y _(t) ^(residual) −ŷ _(t)^(residual))+var(ε_(t) ^(residual))

Prediction interval engine 112 computes the four variances. For thefirst variance, the prediction model is ƒ^(Pred)(w; x), which is anonlinear function with respect to the network weights of w. By thefirst order Taylor expansion, prediction interval engine 112 generates:

ŷ _(t) ^(exogenous)≈ƒ^(Pred)(w*;x)+g ^(T)(w*,x)(ŵ−w*),

where w* is a set of optimal weights, ŵ is the weights learnt from theprediction model training process, and g^(T)(w*,x) is the first orderpartial derivatives of ƒ^(Pred)(w*;x) with respect to w, evaluated atw*. This is denoted as g^(T) in short. Now prediction interval engine112 expresses the variance var(y_(t) ^(exogenous)−ŷ_(t) ^(exogenous))as:

var(y _(t) ^(exogenous) −ŷ _(t) ^(exogenous))=var(ε_(t) ^(exogenous))g^(T)(J ^(T) J)⁻¹ g.

Because prediction network 106 may use regularization during the networktraining, prediction interval engine 112 determines:

var(y _(t) ^(exogenous) −ŷ _(t) ^(exogenous))=var(ε_(t) ^(exogenous))g^(T)(J ^(T) J+λI)⁻¹(JJ)(J ^(T) J+λI)⁻¹ g.

where J is the Hessian matrix of ƒ^(Pred)(w*;x) with respect to w,evaluated at w*, λ is the regularization parameter, and var(ε_(t)^(exogenous)) can be estimated by mean squared errors on the trainingdata, and this is denoted as σ_(Pred) ². The variance prediction isthen:

var(y _(t) ^(residual) −ŷ _(t) ^(residual))=F ^(T) R _(t) F,

where F is the design vector in the model used by prediction intervalengine 112, and R_(t) is estimated by a filtering algorithm Predictioninterval engine 112 estimates the last variance through maximumlikelihood methods in the context of model used by prediction intervalengine 112, and it is denoted as σ_(residual) ².

Prediction interval engine 112 puts all four variances together to getthe final variance estimation of:

var(y _(t) ^(o) −ŷ _(t))=σ_(LSTM) ²(1+g ^(T)(J ^(T) J+λI)⁻¹(JJ)(J ^(T)J+λI)⁻¹ g)+σ_(residual) ² +F ^(T) R _(t) F.

The above yields the prediction interval. Given the above formula,prediction interval engine 112 can construct a prediction interval, suchas a 90% prediction interval, for predictions over the predictionperiod, and the prediction interval will cover the unknown true valueswith a probability of 0.9. The prediction interval bounds the actualwatch percentage.

Conclusion

Accordingly, the above process uses an initial prediction that canpredict the performance of a video that requires multiple outputs and areliance on the previous behavior. Also, the second stage uses acorrection to the initial estimate that can use actual watch behavior toadjust the initial prediction. This process more efficiently generatesthe correction because the initial prediction network does not need tobe used to regenerate the prediction. Also, a prediction interval isused to estimate the uncertainty of the output prediction and deliverthe range of values.

System

Features and aspects as disclosed herein may be implemented inconjunction with a video streaming system 800 in communication withmultiple client devices via one or more communication networks as shownin FIG. 8. Aspects of the video streaming system 800 are describedmerely to provide an example of an application for enabling distributionand delivery of content prepared according to the present disclosure. Itshould be appreciated that the present technology is not limited tostreaming video applications and may be adapted for other applicationsand delivery mechanisms.

In one embodiment, a media program provider may include a library ofmedia programs. For example, the media programs may be aggregated andprovided through a site (e.g., website), application, or browser. A usercan access the media program provider's site or application and requestmedia programs. The user may be limited to requesting only mediaprograms offered by the media program provider.

In system 800, video data may be obtained from one or more sources forexample, from a video source 810, for use as input to a video contentserver 802. The input video data may comprise raw or edited frame-basedvideo data in any suitable digital format, for example, Moving PicturesExperts Group (MPEG)-1, MPEG-2, MPEG-4, VC-1, H.264/Advanced VideoCoding (AVC), High Efficiency Video Coding (HEVC), or other format. Inan alternative, a video may be provided in a non-digital format andconverted to digital format using a scanner and/or transcoder. The inputvideo data may comprise video clips or programs of various types, forexample, television episodes, motion pictures, and other contentproduced as primary content of interest to consumers. The video data mayalso include audio or only audio may be used.

The video streaming system 800 may include one or more computer serversor modules 802, 804, and/or 807 distributed over one or more computers.Each server 802, 804, 807 may include, or may be operatively coupled to,one or more data stores 809, for example databases, indexes, files, orother data structures. A video content server 802 may access a datastore (not shown) of various video segments. The video content server802 may serve the video segments as directed by a user interfacecontroller communicating with a client device. As used herein, a videosegment refers to a definite portion of frame-based video data, such asmay be used in a streaming video session to view a television episode,motion picture, recorded live performance, or other video content.

In some embodiments, a video advertising server 804 may access a datastore of relatively short videos (e.g., 10 second, 30 second, or 60second video advertisements) configured as advertising for a particularadvertiser or message. The advertising may be provided for an advertiserin exchange for payment of some kind or may comprise a promotionalmessage for the system 800, a public service message, or some otherinformation. The video advertising server 804 may serve the videoadvertising segments as directed by a user interface controller (notshown).

The video streaming system 800 also may include server system 102.

The video streaming system 800 may further include an integration andstreaming component 807 that integrates video content and videoadvertising into a streaming video segment. For example, streamingcomponent 807 may be a content server or streaming media server. Acontroller (not shown) may determine the selection or configuration ofadvertising in the streaming video based on any suitable algorithm orprocess. The video streaming system 800 may include other modules orunits not depicted in FIG. 8, for example, administrative servers,commerce servers, network infrastructure, advertising selection engines,and so forth.

The video streaming system 800 may connect to a data communicationnetwork 812. A data communication network 812 may comprise a local areanetwork (LAN), a wide area network (WAN), for example, the Internet, atelephone network, a wireless cellular telecommunications network (WCS)814, or some combination of these or similar networks.

One or more client devices 820 may be in communication with the videostreaming system 800, via the data communication network 812, wirelesscellular telecommunications network 814, and/or another network. Suchclient devices may include, for example, one or more laptop computers820-1, desktop computers 820-2, “smart” mobile phones 820-3, tabletdevices 820-4, network-enabled televisions 820-5, or combinationsthereof, via a router 818 for a LAN, via a base station 817 for awireless cellular telecommunications network 814, or via some otherconnection. In operation, such client devices 820 may send and receivedata or instructions to the system 800, in response to user inputreceived from user input devices or other input. In response, the system800 may serve video segments and metadata from the data store 809responsive to selection of media programs to the client devices 820.Client devices 820 may output the video content from the streaming videosegment in a media player using a display screen, projector, or othervideo output device, and receive user input for interacting with thevideo content.

Distribution of audio-video data may be implemented from streamingcomponent 807 to remote client devices over computer networks,telecommunications networks, and combinations of such networks, usingvarious methods, for example streaming. In streaming, a content serverstreams audio-video data continuously to a media player componentoperating at least partly on the client device, which may play theaudio-video data concurrently with receiving the streaming data from theserver. Although streaming is discussed, other methods of delivery maybe used. The media player component may initiate play of the video dataimmediately after receiving an initial portion of the data from thecontent provider. Traditional streaming techniques use a single providerdelivering a stream of data to a set of end users. High bandwidth andprocessing power may be required to deliver a single stream to a largeaudience, and the required bandwidth of the provider may increase as thenumber of end users increases.

Streaming media can be delivered on-demand or live. Streaming enablesimmediate playback at any point within the file. End-users may skipthrough the media file to start playback or change playback to any pointin the media file. Hence, the end-user does not need to wait for thefile to progressively download. Typically, streaming media is deliveredfrom a few dedicated servers having high bandwidth capabilities via aspecialized device that accepts requests for video files, and withinformation about the format, bandwidth and structure of those files,delivers just the amount of data necessary to play the video, at therate needed to play it. Streaming media servers may also account for thetransmission bandwidth and capabilities of the media player on thedestination client. Streaming component 807 may communicate with clientdevice 820 using control messages and data messages to adjust tochanging network conditions as the video is played. These controlmessages can include commands for enabling control functions such asfast forward, fast reverse, pausing, or seeking to a particular part ofthe file at the client.

Since streaming component 807 transmits video data only as needed and atthe rate that is needed, precise control over the number of streamsserved can be maintained. The viewer will not be able to view high datarate videos over a lower data rate transmission medium. However,streaming media servers (1) provide users random access to the videofile, (2) allow monitoring of who is viewing what video programs and howlong they are watched (3) use transmission bandwidth more efficiently,since only the amount of data required to support the viewing experienceis transmitted, and (4) the video file is not stored in the viewer'scomputer, but discarded by the media player, thus allowing more controlover the content.

Streaming component 807 may use TCP-based protocols, such as HTTP andReal Time Messaging Protocol (RTMP). Streaming component 807 can alsodeliver live webcasts and can multicast, which allows more than oneclient to tune into a single stream, thus saving bandwidth. Streamingmedia players may not rely on buffering the whole video to providerandom access to any point in the media program. Instead, this isaccomplished through the use of control messages transmitted from themedia player to the streaming media server. Other protocols used forstreaming are Hypertext Transfer Protocol (HTTP) live streaming (HLS) orDynamic Adaptive Streaming over HTTP (DASH). The HLS and DASH protocolsdeliver video over HTTP via a playlist of small segments that are madeavailable in a variety of bitrates typically from one or more contentdelivery networks (CDNs). This allows a media player to switch bothbitrates and content sources on a segment-by-segment basis. Theswitching helps compensate for network bandwidth variances and alsoinfrastructure failures that may occur during playback of the video.

The delivery of video content by streaming may be accomplished under avariety of models. In one model, the user pays for the viewing of videoprograms, for example, paying a fee for access to the library of mediaprograms or a portion of restricted media programs, or using apay-per-view service. In another model widely adopted by broadcasttelevision shortly after its inception, sponsors pay for thepresentation of the media program in exchange for the right to presentadvertisements during or adjacent to the presentation of the program. Insome models, advertisements are inserted at predetermined times in avideo program, which times may be referred to as “ad slots” or “adbreaks.” With streaming video, the media player may be configured sothat the client device cannot play the video without also playingpredetermined advertisements during the designated ad slots.

Referring to FIG. 9, a diagrammatic view of an apparatus 900 for viewingvideo content and advertisements is illustrated. In selectedembodiments, the apparatus 900 may include a processor (CPU) 902operatively coupled to a processor memory 904, which holds binary-codedfunctional modules for execution by the processor 902. Such functionalmodules may include an operating system 906 for handling systemfunctions such as input/output and memory access, a browser 908 todisplay web pages, and media player 910 for playing video. The memory904 may hold additional modules not shown in FIG. 9, for example modulesfor performing other operations described elsewhere herein.

A bus 914 or other communication component may support communication ofinformation within the apparatus 900. The processor 902 may be aspecialized or dedicated microprocessor configured to perform particulartasks in accordance with the features and aspects disclosed herein byexecuting machine-readable software code defining the particular tasks.Processor memory 904 (e.g., random access memory (RAM) or other dynamicstorage device) may be connected to the bus 914 or directly to theprocessor 902, and store information and instructions to be executed bya processor 902. The memory 904 may also store temporary variables orother intermediate information during execution of such instructions.

A computer-readable medium in a storage device 924 may be connected tothe bus 914 and store static information and instructions for theprocessor 902; for example, the storage device (CRM) 924 may store themodules 906, 908, 910 and 912 when the apparatus 900 is powered off,from which the modules may be loaded into the processor memory 904 whenthe apparatus 900 is powered up. The storage device 924 may include anon-transitory computer-readable storage medium holding information,instructions, or some combination thereof, for example instructions thatwhen executed by the processor 902, cause the apparatus 900 to beconfigured to perform one or more operations of a method as describedherein.

A communication interface 916 may also be connected to the bus 914. Thecommunication interface 916 may provide or support two-way datacommunication between the apparatus 900 and one or more externaldevices, e.g., the streaming system 800, optionally via a router/modem926 and a wired or wireless connection. In the alternative, or inaddition, the apparatus 900 may include a transceiver 918 connected toan antenna 929, through which the apparatus 900 may communicatewirelessly with a base station for a wireless communication system orwith the router/modem 926. In the alternative, the apparatus 900 maycommunicate with a video streaming system 800 via a local area network,virtual private network, or other network. In another alternative, theapparatus 900 may be incorporated as a module or component of the system800 and communicate with other components via the bus 914 or by someother modality.

The apparatus 900 may be connected (e.g., via the bus 914 and graphicsprocessing unit 920) to a display unit 928. A display 928 may includeany suitable configuration for displaying information to an operator ofthe apparatus 900. For example, a display 928 may include or utilize aliquid crystal display (LCD), touchscreen LCD (e.g., capacitivedisplay), light emitting diode (LED) display, projector, or otherdisplay device to present information to a user of the apparatus 900 ina visual display.

One or more input devices 930 (e.g., an alphanumeric keyboard,microphone, keypad, remote controller, game controller, camera or cameraarray) may be connected to the bus 914 via a user input port 922 tocommunicate information and commands to the apparatus 900. In selectedembodiments, an input device 930 may provide or support control over thepositioning of a cursor. Such a cursor control device, also called apointing device, may be configured as a mouse, a trackball, a track pad,touch screen, cursor direction keys or other device for receiving ortracking physical movement and translating the movement into electricalsignals indicating cursor movement. The cursor control device may beincorporated into the display unit 928, for example using a touchsensitive screen. A cursor control device may communicate directioninformation and command selections to the processor 902 and controlcursor movement on the display 928. A cursor control device may have twoor more degrees of freedom, for example allowing the device to specifycursor positions in a plane or three-dimensional space.

Some embodiments may be implemented in a non-transitorycomputer-readable storage medium for use by or in connection with theinstruction execution system, apparatus, system, or machine. Thecomputer-readable storage medium contains instructions for controlling acomputer system to perform a method described by some embodiments. Thecomputer system may include one or more computing devices. Theinstructions, when executed by one or more computer processors, may beconfigured to perform that which is described in some embodiments.

As used in the description herein and throughout the claims that follow,“a”, “an”, and “the” includes plural references unless the contextclearly dictates otherwise. Also, as used in the description herein andthroughout the claims that follow, the meaning of “in” includes “in” and“on” unless the context clearly dictates otherwise.

The above description illustrates various embodiments along withexamples of how aspects of some embodiments may be implemented. Theabove examples and embodiments should not be deemed to be the onlyembodiments and are presented to illustrate the flexibility andadvantages of some embodiments as defined by the following claims. Basedon the above disclosure and the following claims, other arrangements,embodiments, implementations and equivalents may be employed withoutdeparting from the scope hereof as defined by the claims.

What is claimed is:
 1. A method comprising: receiving, by a computingdevice, a plurality of inputs for a video for a plurality of times at aprediction network that includes a plurality of cells; generating, bythe computing device, a plurality of predictions of watch behavior ofthe video for the plurality of inputs at the plurality of cells, theplurality of predictions predicting a performance of the video on avideo delivery service for the plurality of times, wherein cells in theplurality of cells generate a prediction using an input at a time and aprior prediction from a cell at a previous time; receiving, by thecomputing device, actual performance data generated from users viewingthe video on the video delivery service before a time; generating, bythe computing device, a time series residual for at least a portion ofthe plurality of predictions from the actual performance data and priorpredictions before the time; adjusting, by the computing device, atleast the portion of the predictions after the time using values in thetime series residual; and outputting, by the computing device, theadjusted predictions of watch behavior for the video.
 2. The method ofclaim 1, further comprising: determining a prediction interval for theplurality of predictions, the prediction interval including a lowerbound and an upper bound for the plurality of predictions.
 3. The methodof claim 1, wherein generating the plurality of predictions comprises:receiving a first prediction from a first cell at a second cell, thefirst prediction for a first time in a series; receiving an input at thesecond cell, the input based on a second time in the series; and usingthe first prediction and the input to generate a second prediction forwatch behavior for the second time.
 4. The method of claim 3, furthercomprising: outputting the prediction for the watch behavior for thesecond time to a third cell, the third cell configured to generate athird prediction for watch behavior for a third time.
 5. The method ofclaim 1, further comprising: using a plurality of additional layers toprocess the plurality of outputs to generate the plurality ofpredictions.
 6. The method of claim 5, wherein the plurality ofadditional layers comprises dense layers that modify the plurality ofpredictions from the plurality of cells.
 7. The method of claim 1,wherein each cell includes a plurality of neurons that compute a portionof the prediction for each cell.
 8. The method of claim 1, wherein: eachcell includes a plurality of neurons, and outputs from each neuron in acell are used to determine the prediction for the cell.
 9. The method ofclaim 8, wherein each neuron in a cell is coupled to another neuron inanother cell to provide a prediction to the other neuron.
 10. The methodof claim 8, wherein: a neuron receives a previous cell state and aprevious cell output for a previous neuron in a previous cell, and theneuron uses the previous cell state and a previous cell output togenerate a new cell state and a new cell output.
 11. The method of claim10, wherein: the neuron weights the previous cell output and combinesthe weighted previous cell output with the previous cell state togenerate the new cell state.
 12. The method of claim 10, wherein: theneuron weights the previous cell output and combines the weightedprevious cell output with the new cell state to generate the new celloutput.
 13. The method of claim 1, wherein the prediction network is notused after received the actual performance data to generate the adjustedpredictions.
 14. The method of claim 1, wherein: a look back length of anumber of past videos to use is based on a remember probability weightversus an input probability weight of a neuron in a cell, and theremember probably and the input probability is used to weight an outputof a previous neuron.
 15. A non-transitory computer-readable storagemedium containing instructions, that when executed, control a computersystem to be configured for: receiving a plurality of inputs for a videofor a plurality of times at a prediction network that includes aplurality of cells; generating a plurality of predictions of watchbehavior of the video for the plurality of inputs at the plurality ofcells, the plurality of predictions predicting a performance of thevideo on a video delivery service for the plurality of times, whereincells in the plurality of cells generate a prediction using an input ata time and a prior prediction from a cell at a previous time; receivingactual performance data generated from users viewing the video on thevideo delivery service before a time; generating a time series residualfor at least a portion of the plurality of predictions from the actualperformance data and prior predictions before the time; adjusting atleast the portion of the predictions after the time using values in thetime series residual; and outputting the adjusted predictions of watchbehavior for the video.
 16. The non-transitory computer-readable storagemedium of claim 15, further configured for: determining a predictioninterval for the plurality of predictions, the prediction intervalincluding a lower bound and an upper bound for the plurality ofpredictions.
 17. The non-transitory computer-readable storage medium ofclaim 15, wherein generating the plurality of predictions comprises:receiving a first prediction from a first cell at a second cell, thefirst prediction for a first time in a series; receiving an input at thesecond cell, the input based on a second time in the series; and usingthe first prediction and the input to generate a second prediction forwatch behavior for the second time.
 18. The non-transitorycomputer-readable storage medium of claim 15, wherein: each cellincludes a plurality of neurons, and outputs from each neuron in a cellare used to determine the prediction for the cell.
 19. Thenon-transitory computer-readable storage medium of claim 18, wherein: aneuron receives a previous cell state and a previous cell output for aprevious neuron in a previous cell, and the neuron uses the previouscell state and a previous cell output to generate a new cell state and anew cell output.
 20. An apparatus comprising: one or more computerprocessors; and a non-transitory computer-readable storage mediumcomprising instructions, that when executed, control the one or morecomputer processors to be configured for: receiving a plurality ofinputs for a video for a plurality of times at a prediction network thatincludes a plurality of cells; generating a plurality of predictions ofwatch behavior of the video for the plurality of inputs at the pluralityof cells, the plurality of predictions predicting a performance of thevideo on a video delivery service for the plurality of times, whereincells in the plurality of cells generate a prediction using an input ata time and a prior prediction from a cell at a previous time; receivingactual performance data generated from users viewing the video on thevideo delivery service before a time; generating a time series residualfor at least a portion of the plurality of predictions from the actualperformance data and prior predictions before the time; adjusting atleast the portion of the predictions after the time using values in thetime series residual; and outputting the adjusted predictions of watchbehavior for the video.